Online-Academy
Look, Read, Understand, Apply

Data Mining And Data Warehousing

k-medoids clustering algorithm

K-medoids

K-Medoids clustering algorithm similar to K-Means, but instead of using means (averages) to define cluster centers, it uses medoids — actual data points that are most centrally located within a cluster. This makes K-Medoids more robust to noise and outliers than K-Means.
Working of K-medoids
    Initialize:
  • Randomly choose k data points as initial medoids (actual representative data point).
  • Assign:
  • Assign each data point to the nearest medoid (using a distance metric).
  • Update:
  • For each medoid, try swapping it with a non-medoid point and check if the total cost (sum of distances) decreases.
  • If yes, perform the swap.
  • Repeat:
Advantages
  • Robust to outliers since it uses actual data points.
  • Works with arbitrary distance metrics (Euclidean, Manhattan, etc.).
  • Better than K-Means when data has categorical or mixed types.
Disadvantages
  • Slower than K-Means, especially on large datasets (because of pairwise comparisons).
  • Still requires k to be known beforehand.
  • Not ideal for very high-dimensional data unless optimized.